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By defining the phenotype of a biopolymer by its active three-dimensional shape, and its geno- 
type by its primary sequence, we propose a model that predicts and characterizes the statistical 
distribution of a population of biopolymers with a specific phenotype, that originated from a given 
genotypic sequence by a single mutational event. Depending on the ratio go that characterizes the 
spread of potential energies of the mutated population with respect to temperature, three different 
statistical regimes have been identified. We suggest that biopolymers found in nature are in a crit- 
ical regime with go ~ 1 — 6, corresponding to a broad, but not too broad, phenotypic distribution 
resembling a truncated Levy flight. Thus the biopolymer phenotype can be considerably modified in 
just a few mutations. The proposed model is in good agreement with the experimental distribution 
of activities determined for a population of single mutants of a group I ribozyme. 



PACS numbers: 87.15.He, 87.15.Cc, 05.40.Fb, 87.23. Kg 

I. INTRODUCTION 

The biological function (or phenotype) of a biopoly- 
mer, such as a ribonucleic acid (RNA) or a protein, 
is mostly determined by the three-dimensional struc- 
ture resulting from the folding of linear sequence of nu- 
cleotides (RNA) or aminoacids (proteins) that specifies a 
genotype. Generally, a natural biopolymer sequence (or 
genotype) codes for a specific two-dimensional or three- 
dimensional structure that defines the biopolymer activ- 
ity. But one sequence can simultaneously fold in several 
metastable structures that can lead to different pheno- 
types. Thus, random mutations of a sequence induce 
random changes of the metastable structure populations, 
which generates a random walk of the biopolymer func- 
tion. Understanding this phenotype random walk is a 
basic goal for "quantitative" biomolecular evolution. 

The statistical properties of RNA secondary structures 
considered as a model for genotypes have been investi- 
gated in depth in the recent years [lj. The neutral net- 
work concept 0Q- i-e., the notion of a set of sequences, 
connected through point mutations, having roughly the 
same phenotype, has been shown to apply to RNA sec- 
ondary structures. Thus, by drifting rapidly along the 
neutral network of its phenotype, a sequence may come 
close to another sequence with a qualitatively different 
phenotype, which facilitates the acquisition of new phe- 
notypes through random evolution. Moreover, in the 
close vicinity of any sequence with a given structure, 
there exist sequences with nearly all other possible struc- 
tures , as originally proposed in immunology [f| . Thus, 
even if the sequence space is much too vast to be explored 
through random mutations in a reasonable time (an RNA 
with 100 bases only has 10 60 possible sequences), the phe- 
notype space itself may be explored in a few mutations 
only, which is what matters biologically. These ideas have 
been brought into operation in a recent experiment [|J 
showing that a particular RNA sequence, catalyzing a 



given reaction, can be transformed into a sequence hav- 
ing a qualitatively different activity, using a small num- 
ber of mutations and without ever going through inactive 
steps. 

This paper investigates the phenotype space explo- 
ration at an elementary level by studying the statistical 
distribution of a population of biopolymers in a specific 
three-dimensional shape, that originated from a given 
genotypic sequence by a single mutational event. It com- 
plements studies of the evolution from one structure to 
another structure 0, that consider only the most sta- 
ble structure for each sequence and neglect the ther- 
modynamical coexistence of different structures for the 
same sequence. It also provides more grounds to the re- 
cent work that suggests that RNA molecules with novel 
phenotypes evolved from plastic populations, i.e., pop- 
ulations folding in several structures, of known RNA 
molecules It is experimentally evident, for instance in 
[6| , that some mutations change the biopolymer chemical 
activity by a few percents while other mutations change it 
by orders of magnitude. This is not unexpected since, de- 
pending on their positions in the sequence, some residues 
have a dramatic influence on the 3D conformation while 
others hardly matter. Thus, the function random walk 
statistically resembles a Levy flight 0, ITfl llll | presenting 
jumps at very different scales. The respective parts of 
gradual changes and of sudden jumps in biological evolu- 
tion is a highly debated issue. While the gradualist point 
of view has historically dominated, evidences for the pres- 
ence of jumps have accumulated at various hierarchical 
levels from paleontology 12], to trophic systems, chemi- 
cal reaction networks and neutral networks and molecu- 
lar structure [jj. The jump issue will be treated here by 
studying the statistical distribution describing the phe- 
notype effects of random mutations of a biopolymer geno- 
type. 

To address the question of the statistical effects of ran- 
dom mutations of functionally active biopolymers, we 
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propose a model inspired from disordered systems physics 
that naturally predicts the possibility of broad distri- 
butions of activities of randomly mutated biopolymers. 
With two energy parameters describing the polymer en- 
ergy landscape, this models is shown to exhibit a vari- 
ety of behaviors and to fit experimental data. Natural 
biopolymers are in a critical regime, related to the ac- 
tivity distribution broadness, in which a single mutation 
may have a large, but not too large, effect. 



II. PHYSICAL MODEL OF SHAPE 
POPULATION DISTRIBUTION 

The most favorable conformational state of a biopoly- 
mer sequence with a given biological activity is gener- 
ally considered to be the most stable one within the se- 
quence energy landscape. The ruggedness of the energy 
landscape might vary depending on the number of other 
metastable, conformational states accessible by the se- 
quence. The typical energy spacing between these states 
can be small enough so that several states of low energy 
can be populated. For simplicity, we will consider a se- 
quence that is able to fold into its two lowest energy con- 
formational states, an active state A of specific biological 
function, and an inactive state I of unknown function 
[25). but whose energy is the closest to A's (higher or 
lower) (see figure 0). The differences between the free 
energies of the unfolded and folded states for A and I are 
denoted AG a and AGi, respectively. 

A mutation, i.e., a random change in the biopolymer 
sequence, modifies the biopolymer energy landscape so 
that AGa and AGi are transformed into (AGa)m and 
(AGi)m- Note that the conformer state A of the mu- 
tant, its three dimensional shape, is the same as before 
whereas the conformer state I does not have to be the 
same as before. To take into account the randomness 
of the mutational process, the mutant free energy differ- 
ence 5Gm = (AGi)m ~ (AGa)m is taken either with a 
Gaussian distribution: 
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or with a two-sided exponential (Laplace) distribution 
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where SG is the mean of 5Gm and where 5Go charac- 
terizes the width of the distribution. These two energy 
distributions are commonly used for disordered systems 
|13| and enable us to cover a range of situations from 
narrow (Gaussian) to relatively broad (exponential) dis- 
tributions. Assuming thermodynamic rather than kinetic 
control, the populations 7ta and iri = 1—tta of conformers 
A and I, respectively, are given by Boltzmann statistics: 



1 



tta 



1 _|_ e -SG m /RT ' 



(3) 




FIG. 1: Schematic representations of the molecular energy 
landscapes, (a) For the non-mutated molecule, (b) For the 
mutated molecule. Only the two lowest energy conformations, 
A (active) and I (inactive), are taken into account. Their 3D 
conformations are indicated symbolically. The shaded dots 
indicate the populations 7ta and tti at thermal equilibrium. 



where R is the gas constant and T is the temperature. 

From the distributions of free energy differences and 
eq. Q, one infers the probability distributions -P e orG(iA.) 
of the population of conformer state A after a mutation 
using PcorG^A) = P G oig(<5Gm) x \d5G M /d7T A | . For the 
Gaussian model, one obtains: 
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where g = SG/{RT), g = SG /(RT). The ratio g of 
the scale of energy fluctuations and of the thermal energy 
appears frequently in the study of the anomalous kinetics 
of disordered systems. For the exponential model, one 
obtains: 



-Pe(7T A ) = 
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for 7T A < 7r m , (5a) 



for 7TA > 7T m , (5b) 



with the same definitions for g and g Q , and 



(1 



') (median population of A). Note that chang- 



ing g into —g is equivalent to performing a symmetry on 
Pe or G (tta ) by replacing 7ta by 1 - tta- 



III. TYPES OF DISTRIBUTIONS 

To analyze the different types of population distribu- 
tions, we focus for definiteness on the Gaussian model. 
A qualitatively similar behavior is obtained for the expo- 
nential model. Figure |21 represents examples of Pg(""a) 
for the Gaussian model with g = —1 and various go 's. 
The negative value of g implies that A is on average less 
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FIG. 2: Distributions Pg(tta) °f shape populations of mu- 
tated molecules for g — —1. They are narrow and single 
peaked for small enough g and broad and double peaked for 
large enough go . The transition from one to two peaks occurs 
at go ~ 1.976 in agreement with eq. ©. Inset: logarithmic 
plot of Pg(tta) for go = 3 showing the broad character of the 
small 7ta peak. 



stable than I, and hence that 7ta is predominantly less 
than 50%. For small go, the distribution Pq^a) is nar- 
row since the width 5Gq of the free energy distribution is 
small compared to RT so that there are only small fluc- 
tuations of population around the most probable value. 
When the energy broadness go increases, the single nar- 
row peak first broadens till, when go > 1.976 it splits into 
two peaks, close respectively to tta — and to tta = 1. 
The broad character of Pg(tta) can be intuitively under- 
stood as a consequence of the non linear dependence of 
7ta on SGm- Thus, when the fluctuations of SGm are 
larger than RT, i.e., when go > 1, the quasi exponential 
dependence of 7ta on SGm (eq. ©) non linearly magnifies 
SGm fluctuations to yield a broad 7ta distribution, even 
if SGm fluctuations are relatively small compared to the 
mean SG. A similar mechanism is at work for tunneling 
in disordered systems 0, 0] . 

A global view of the possible shapes of Pg(7ta) is given 
in figure |3| For any given g, when increasing go starting 
from 0, the single narrow peak of Pg(tta) first broadens 
then it splits into two peaks when g = g s ign(g)(3o) with 



g±{go) = ± 



5oV5o- 2 + ln 
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(This expression results from a lengthy but straightfor- 
ward study of Pg(7Ta).) When go increases further, these 
two peaks get closer to tta — and to tta — 1 while ac- 
quiring significant tails (see section IVlIj l. For any given 
go, increasing g roughly amounts to moving the popula- 
tions 7ta towards larger values as expected since larger 
g's correspond to stabler states A. However, distinct be- 
haviours arise depending on g . If go < whatever the 
value of g, the distribution Pq^a) is always sufficiently 
narrow to present a single peak. If go > V2, the dis- 
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FIG. 3: Possible shapes of Pg(7ta). The shaded area indicates 
the two-peaks region. The dashed line gives the transition 
from one to two peaks (c/. eq. ©). Insets show examples of 
Pg(ta) corresponding to the go and g indicated by the black 
dots (Pg's not to scale). The black square corresponds to the 
fit of figure 0] data. 



tribution Pq(7Ta) is sufficiently broad to have two peaks 
when, furthermore, the distribution is not too asymmet- 
ric, which occurs for g £ [g_(go),g+(go)]. In short, de- 
pending on g — 8G/RT, which characterizes mainly the 
peak(s) position, and on go = Go/RT, which character- 
izes mainly the distribution broadness, the distributions 
Pg(7ta) are either unimodal or bimodal, either broad or 
narrow. This variety of behaviors is reminiscent of beta 
distributions. 



IV. FROM SHAPE POPULATIONS TO 
CATALYTIC ACTIVITIES 

Up to now, we have discussed the distribution P{tta) 
of the population of a shape A that is functionally ac- 
tive. However, as far as it concerns biopolymers with 
enzymatic functions, what is usually measured is a chem- 
ical activity a, i.e., the product of a reaction rate k 
for the conformer A by the population 7ta of this con- 
former. The reaction rates are given by the Arrhcnius 
law k = koe~ Ea / RT where ko is a constant and E a is the 
activation energy. Thus, the chemical activity writes, us- 
ing eq. ©: 
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Random mutations may induce random modifications of 
E a , SGm or both. Fluctuations of SGm have been treated 
above. One can introduce fluctuations of E a in the same 
way. We do not do it here in details but present only the 
general trends. 

The effects of adding an activation energy distribution 
in addition to the free energy difference distribution are 
twofold. For small activities, the distribution P(a) of 
chemical activities is similar to the small P(tta) peak at 



small 7ta- Indeed, the reaction rate k depends exponen- 
tially on E a , just as the population 7Ta depends exponen- 
tially on i5Gm when tta <C 1. Moreover, the product of 
two broadly distributed random variables is also broadly 
distributed 26] with a shape similar to the one of P(tta). 
For large activities, on the other hand, tta and k behave 
differently because tta is bounded by 1 while k is un- 
bounded. Thus, if the k distribution is broad enough, 
the distribution of a at large a may exhibit a broadened 
structure compared to the 7Ta — 1 peak of P(tta). 

In summary, the distribution of chemical activities 
P(a) is similar to the distribution of shape populations 
P[tta) when P(ita) presents a large tta — peak (condi- 
tions for this to occur are explicited in section lvT|) . Thus, 
by observing the shape of the a ~ peak in the activ- 
ity distribution P(a), one does not easily distinguish be- 
tween activation energy dispersion, which affects k, and 
free energy difference dispersion, which affects 7Ta- On 
the other hand, at large a, P(a) is differently influenced 
by activation energy dispersion and by free energy dif- 
ference dispersion. The available experimental data (see 
section^ enables us to analyze precisely P(a) at small 
activities but not at large activities. Thus, for practical 
purposes, it is not meaningful in this paper to consider 
a distribution of activation energies on top of a distri- 
bution of free energy differences. In the sequel, we will 
thus do as if only the distribution of free energies was 
involved, stressing that similar effects can be obtained 
from a distribution of activation energies. 



V. ANALYSIS OF EXPERIMENTAL DATA 

Comparison of the theoretical distributions of eq. (0) 
and eq. @ with experimental data enables us to test the 
relevance of the proposed model. We have analyzed the 
measurements of the catalytic activities of a set of 157 
mutants derived from a self-splicing group I ribozyme, a 
catalytic RNA molecule 16] (out of the 345 mutants gen- 
erated in we only considered the 157 ones with single 
point mutations). The original "wild-type" molecule is 
formed of a conserved catalytic core that catalyzes the 
cleavage of another part of the molecule considered as 
the substrate. The set of mutants is derived from the 
original ribozyme by performing systematically all single 
point mutations of the catalytic core, i.e., of the part of 
the molecule that influences most the catalytic activity. 
Nucleotides out of the core, that in general influence less 
the catalytic activity, are left unmutated. Thus, in our 
framework, this set of mutants can be seen as biased to- 
wards deleterious mutations. Indeed, mutations of the 
quasi optimized core are likely to lead to much less ac- 
tive mutants, while mutations of remote parts are likely 
to leave the activity essentially unchanged. If all parts 
of the molecule had been mutated, more neutral or quasi 
neutral mutations would have been obtained. Another 
point of view, which we adopt here is to consider the cat- 
alytic core as a molecule in itself, on which all possible 




FIG. 4: Analysis of an experimental distribution of activities. 
Experimental data are derived from [Tfij . Error bars give the 
one standard deviation statistical uncertainty. The solid line 
is a two parameter fit (g, go) to the model of Gaussian energy 
distribution. The dashed lines correspond to the same go and 
modified g's, which enables to estimate the uncertainty on g. 
Inset: comparison of the data to the model of exponential 
energy distribution (go and g are not fitted again but taken 
from the Gaussian model fit). 



single point mutations have been performed. 

The 157 measured activities are used to calculate a 
population distribution with inhomogenous binning (c/. 
broad distribution). Two bins required special treament: 
the smallest bin, centered in 0.5%, contains 40 mutants 
with non measurably small activities (< 1% of the orig- 
inal activity); the largest bin, centered in 95%, contains 
the 6 mutants with activities larger than 90% of the orig- 
inal 'wild' RNA activity (the largest measured mutant 
activity is 140%). These two points, whose abscissae are 
arbitrary within an interval, are not essential for the ob- 
tained results. At last, as very few mutants have ac- 
tivities larger than the wild-type ribozyme, the propor- 
tionality constant between activity and population is set 
by matching a population 7ta = 1 to the activity of the 
wild- type ribozyme. 

The obtained distribution (see figure 0} has a large 
peak in 7ta — 0, indicating that most mutations are dele- 
terious, with a long tail at larger activities and a possible 
smaller peak in 7ta — 1. This non trivial shape is well fit- 
ted by the Gaussian model of eq. |@J with g — —3.6 and 
go = 2.9 (the uncertainty on these parameters is about 
50%, see dashed lines in figure QJ. One infers SG ~ — 2.1 
kcal/mol and SG ~ 1.7 kcal/mol (T = 300 K). The 
order of magnitude of these values is compatible with 
thermodynamic measurements performed on similar sys- 
tems [HEEHl. This confirms the plausibility of the 
proposed approach. The inset of figure0]shows the pop- 
ulation distribution in the exponential model with g and 
go values taken from the Gaussian fit. The agreement 
with the experimental data is also quite good. Thus, the 
proposed approach soundly does not strongly depend on 
the yet unknown shape details of the energy distribution. 



Finally, one can estimate the broad character of the ac- 
tivity distribution from the statistical analysis of the ex- 
perimental data. Indeed, according, e.g., to the Gaussian 
model fit, the typical, most probable, population tta is 
found to be ~ 6 x 1CP 6 while the mean population is 
~ 0.15. Thus, the activity distribution spans more than 
four orders of magnitude. 



VI. COARSE GRAINING DESCRIPTION: 
OR NONE FEATURES 



ALL 



The variation of activity of a biopolymer upon muta- 
tion is often described as an 'all or none' process: mu- 
tations are considered either as neutral (the mutant re- 
tains fully its activity and tta — 100 %) or as lethal (the 
mutant loses completely its activity and tta — %). Sat- 
isfactorily, a coarse graining description of the proposed 
statistical models exhibits such all or none regimes for 
appropriate (g, g ) values, as well as other regimes. 

To obtain a quantitative coarse graining description, 
we define the mutants with 'no' activity as those with 
population that has less than 12 % (~ ita(SGm = 
—2RT)) in the A shape. Their weight is 
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Similarly, the mutants with 'full', respectively 'interme- 
diate', activity are defined as those with tta > 88%, 
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and their weight 

is w wo = J^ T P colG (SG) dSG, respectively wi = 

[■+2RT 
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sian model leads to 



J-2RT Pcotg(SG) dSG. Taking for definiteness the Gaus- 
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where $(u) = /" e t2 / 2 dt/y/2n is the distribution 
function of the normal distribution. Similarly, one has 
uh = $[(2 - g)/g ] - $[-(2 + g)/g ] and w 100 = 1 - 
$[(2 — g)/go\. Approximate expressions for $ (it) (<&(it) ~ 
-e- u2 / 2 /{V2^u) for u < -1, ~ 1/2+ u/V^tt for 

\u\ < 1 and $(it) ~ 1 - e _ " 2/2 /(v / 27ru) for u > 1) give 
the regimes in which each weight w is negligible (w <C 1), 
dominant (1 — w -C 1) or in between. For instance, wq is 
negligible for g > go — 2, dominant for g < —go — 2 and 
intermediate for —go — 2<g<go — 2. These inequalities 
indicate the transition from one regime to another. To 
be strictly in one regime requires typically that g/go is 
larger or greater than 1 from the corresponding criterion, 
e.g., wo is strictly negligible when g/g > 1 + (go - 2)/g 
The transitions from one regime to another one are in 
general exponentially fast (solid lines in figure |SJ). How- 
ever, in the region (go > 2, \g\ < go — 2), the transitions 
from one regime to another one are smooth (dashed lines 
in figure [SJ) since, in this region, the weights vary slowly, 
e.g., Wi ~ 4/(.9oV27r)- 
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FIG. 5: Coarse graining features of the population distri- 
bution -Pg(ta) in the Gaussian model. In each region, the 
population ranges dominating the distribution have been in- 
dicated (0 for tta < 12%, i for 12% < tt a < 88% and 100 for 

7TA > 88%). 



The resulting coarse graining classification of Pg(^a) 
is represented in figure The 'all or none' behaviour, 
denoted '0 & 100', appears in the region go > Q/ \Jtt/2 

an d |<?| ^$ \J 7r /2go — 6 as the result of a large dispersion 
of energy differences associated to a moderate average 
energy difference. We note that all possible types of dis- 
tributions are actually present in this model: probabil- 
ities concentrated at small, intermediate or large values 
(0, i or 100); probabilities spread over both small and 
intermediate (0 &; i), both small and large (0 & 100, all 
or none) or both intermediate and large (i & 100) values; 
probabilities spread over small, intermediate and large 
values at the same time (0 & i & 100). The coarse grain- 
ing classification of figure [5] complements the number of 
peaks classification of figure|3without overlapping it. In- 
deed, there exist parameters go and g for which, e.g., 
two peaks coexist but one of these peaks has a negligible 
weight. Thus the presence of a peak is not automatically 
associated to a large weight in the region of this peak. 



VII. ZOOMING IN THE tt a 
TAILS 



PEAK: LONG 



To go beyond the coarse graining description, we zoom 
in the 7ta — peak. As shown in the inset of figure |3J 
the small activities, labelled as 'no activity' in a coarse 
graining description, actually consist of non zero activ- 
ities with values scanning several orders of magnitude. 
This can be analyzed quantitatively, e.g., in the Gaus- 
sian model. For tta — 0, the activity distribution given 
by eq. (0J) is quasi lognormal: 
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Thus, Pg( 7I "a) has as a power law like behavior [l5ll2l|: 
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in the vicinity of the lognormal median e 9 . This cor- 
responds to an extremely long tailed distribution, since 
1/tta is not even normalizable. It presents the peculiarity 
that, for a and a + 1 belonging to [g — \J~2go , g + V%go] , the 
probability to obtain a population 7ta of a given order of 
magnitude a, i.e., 7Ta € [e a ,e a+1 ], does not depend on 
the considered ordered of magnitude a, since 



PG(7TA)d7TA — const. 



(12) 



Thus, if a living organism has to adapt the chemical ac- 
tivity of one of its biopolymer constituents, it can explore 
several order of magnitude of activity by only few muta- 
tions within the biopolymer. The activity changes mimic 
a Levy flight [z| as revealed, e.g., by the experimental 
data in The large activity changes will raise self- 
averaging issues [l5| that will add up to those generated 
by correlations along evolutionary paths |2^| 

Three broadness regimes corresponding to three evo- 
lutionary regimes can be distinguished. If go is very 
large, the mutant activities span a very large range. This 
regime might be globally lethal because, in most cases, 
the mutant activity will be either too low or too large to 
be biologically useful. However, under conditions of in- 
tense stress, the large variability might allow the system 
to evolve radically. With go = 10, for instance, the ac- 
tivity range covers typically 12 orders of magnitude from 
10~ 6 e 9 to 10 6 e 9 (see eq. If go is moderately large, 

the mutant activities span just a few orders of magni- 
tude. This regime is broad enough to permit significant 
changes, but not too broad to avoid producing too many 
lethal changes. With go — 3, for instance, the activity 
range covers typically 3 — 4 orders of magnitude from 
10~ 18 e^ to 10 18 e^. If go is small, the lognormal dis- 
tribution peak can be approximated by a Gaussian 
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The distribution is now narrow and the ranges of val- 
ues is typically [e^(l — 2g ),e 9 (l + 2g )]. This type of 
distribution is not adapted for producing large changes, 
but rather for performing fine tuning optimization. With 
go = 0.1, for instance, the activity range covers only 
±20% around e 9 . 

We remark that the group I ribozyme which we have 
analyzed corresponds to go — 2.9, right in the critical 
regime of moderately large go- One can guess from ex- 
perimental studies of other biopolymers or from chemi- 
cal considerations that most biopolymers will fall in this 
range since SGo is typically on the order of a few kilocalo- 
ries while RT is ~ 0.6 kcal (Note that SGo corresponds to 
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FIG. 6: Free energy dispersion (5Go as a function of go for 
different temperatures T. The upper and lower temperature 
limits for life are, respectively, ~ 121° C and ~ — 20° C. 
Depending on whether the energy statistics is determined by 
the biophysics or by evolutionary requirements, the range of 
either SGo or of go is fixed (see for example the dashed lines). 
Evolutionary requirements suggest 1 < go < 6. 



the free energy change between the biopolymer native 3D 
state and an unfolded state, in which the biopolymer has 
lost its three-dimensional shape but not its full secondary 
structure). It would be interesting to perform further 
statistical data analysis to see how, e.g., the available 
protein mutagenesis studies fit with our present model. 

The energy statistics associated mutations is likely to 
be determined at gross scale by the basic biophysics of 
the molecules involved. This fixes a range for 5Go- It 
is nonetheless plausible and suggested by our discussion 
that there is an evolutionary preferred type of activity 
distribution, and hence of sequences, that may imply a 
fine tuning of go = 5Go/RT within the constraints on 
6Go coming from biophysics (see figure EJ) so that each 
mutation typically generates a significant, but not sys- 
tematically lethal, activity change. If one considers that 
the activity changes must cover between, say, one and 
seven orders of magnitude, then the allowed go range is 
1 — 6 (see eq. (tTlTlh 

To answer the question whether the energy statistics 
is solely dictated by molecular biophysics or whether it 
is also influenced by evolutionary requirements, one may 
compare the energy statistics of molecules from differ- 
ent thermal environments. The conservation of the SGo 
range across psychrophilic and thermophilic molecules 
would stress the domination of biophysics factors. Note 
that our model would then imply different stochastic evo- 
lutionary dynamics, through the width of the activity 
distribution, for psychrophilic and thermophilic environ- 
ments. Conversely, the conservation of go would reveal 
the importance of evolutionary requirements. 
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VIII. CONCLUSIONS 

In this paper, we have presented a model for the distri- 
bution of biopolymer activities resulting from mutations 
of a given sequence. The model is characterized by the 
statistics of the energy differences between active con- 
formations and inactive conformations. A similar model 
would be obtained by considering the statistics of acti- 
vation energies. The model fits the measured activity 
distribution of a ribozyme with energy parameters in the 
physically appropriate range. It is also able to reproduce 
commonly observed behaviours such as all or none. 

Importantly, the peak of small activities exhibits three 
distinct types depending on the broadness of the distri- 
bution of energy differences. Real biopolymers are in a 
critical regime allowing the exploration of different ranges 
of activities in a few mutations without being too of- 
ten lethal. This critical regime seems the most favor- 



able evolutionary regime and could be the statistical en- 
gine allowing molecular evolution. Thus the present work 
supports the idea that, for evolution to take place, the 
temperature and the physico-chemistry dictating the free 
energy scales of biopolymers must obey a certain ratio. 
At last, it suggests that, by looking at small variations of 
this ratio, one might be able to classify biopolymers. One 
expects, for instance, that biopolymer sequences that 
are locked in a shape with a specific function, will have 
smaller go than rapidly evolving biopolymers sequences 
that could acquire new functions by undergoing major 
structural changes. Thus, at the origin of life or during 
rapidly evolving punctuations, biopolymers with larger 
go than those characterizing highly optimized, modern 
RNA and protein molecules, could have contributed to 
the emergence of novel phenotypes, leading thus to an 
increase of complexity. 
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